A neural architecture for computing acoustic-phonetic invariants

نویسنده

  • Elaine Tsiang
چکیده

The proposed neural architecture consists of an analytic lower net, and a synthetic upper net. This paper focuses on the upper net. The lower net performs a 2D multiresolution wavelet decomposition of an initial spectral representation to yield a multichannel representation of local frequency modulations at multiple scales. From this representation, the upper net synthesizes increasingly complex features, resulting in a set of acoustic observables at the top layer with multiscale context dependence. The upper net also provides for invariance under frequency shifts, dilatations in tone intervals and time intervals, by building these transformations into the architecture. Application of this architecture to the recognition of gross and fine phonetic categories from continuous speech of diverse speakers shows that it provides high accuracy and strong generalization from modest amounts of training data.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

New Algorithm For Computing Secondary Invariants of Invariant Rings of Monomial Groups

In this paper, a new  algorithm for computing secondary invariants of  invariant rings of monomial groups is presented. The main idea is to compute simultaneously a truncated SAGBI-G basis and the standard invariants of the ideal generated by the set of primary invariants.  The advantage of the presented algorithm lies in the fact that it is well-suited to complexity analysis and very easy to i...

متن کامل

Phone-aware Neural Language Identification

Pure acoustic neural models, particularly the LSTM-RNN model, have shown great potential in language identification (LID). However, the phonetic information has been largely overlooked by most of existing neural LID models, although this information has been used in the conventional phonetic LID systems with a great success. We present a phone-aware neural LID architecture, which is a deep LSTM...

متن کامل

Inferior frontal regions underlie the perception of phonetic category invariance.

The problem of mapping differing sensory stimuli onto a common category is fundamental to human cognition. Listeners perceive stable phonetic categories despite many sources of acoustic variability. What are the neural mechanisms that underlie this perceptual stability? In this functional magnetic resonance imaging study, a short-interval habituation paradigm was used to investigate neural sens...

متن کامل

Text-to-speech conversion with neural networks: a recurrent TDNN approach

This paper describes the design of a neural network that performs the phonetic-to-acoustic mapping in a speech synthesis system. The use of a time-domain neural network architecture limits discontinuities that occur at phone boundaries. Recurrent data input also helps smooth the output parameter tracks. Independent testing has demonstrated that the voice quality produced by this system compares...

متن کامل

Splice Graphs and their Vertex-Degree-Based Invariants

Let G_1 and G_2 be simple connected graphs with disjoint vertex sets V(G_1) and V(G_2), respectively. For given vertices a_1in V(G_1) and a_2in V(G_2), a splice of G_1 and G_2 by vertices a_1 and a_2 is defined by identifying the vertices a_1 and a_2 in the union of G_1 and G_2. In this paper, we present exact formulas for computing some vertex-degree-based graph invariants of splice of graphs.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998